Goto

Collaborating Authors

 converge weakly


Asymptotic and Finite-Time Guarantees for Langevin-Based Temperature Annealing in InfoNCE

Chaudhry, Faris

arXiv.org Machine Learning

The InfoNCE loss in contrastive learning depends critically on a temperature parameter, yet its dynamics under fixed versus annealed schedules remain poorly understood. We provide a theoretical analysis by modeling embedding evolution under Langevin dynamics on a compact Riemannian manifold. Under mild smoothness and energy-barrier assumptions, we show that classical simulated annealing guarantees extend to this setting: slow logarithmic inverse-temperature schedules ensure convergence in probability to a set of globally optimal representations, while faster schedules risk becoming trapped in suboptimal minima. Our results establish a link between contrastive learning and simulated annealing, providing a principled basis for understanding and tuning temperature schedules.




Discovering Causal Relationships using Proxy Variables under Unmeasured Confounding

Wu, Yong, Fu, Yanwei, Wang, Shouyan, Wang, Yizhou, Sun, Xinwei

arXiv.org Machine Learning

Inferring causal relationships between variable pairs in the observational study is crucial but challenging, due to the presence of unmeasured confounding. While previous methods employed the negative controls to adjust for the confounding bias, they were either restricted to the discrete setting (i.e., all variables are discrete) or relied on strong assumptions for identification. To address these problems, we develop a general nonparametric approach that accommodates both discrete and continuous settings for testing causal hypothesis under unmeasured confounders. By using only a single negative control outcome (NCO), we establish a new identification result based on a newly proposed integral equation that links the outcome and NCO, requiring only the completeness and mild regularity conditions. We then propose a kernel-based testing procedure that is more efficient than existing moment-restriction methods. We derive the asymptotic level and power properties for our tests. Furthermore, we examine cases where our procedure using only NCO fails to achieve identification, and introduce a new procedure that incorporates a negative control exposure (NCE) to restore identifiability. We demonstrate the effectiveness of our approach through extensive simulations and real-world data from the Intensive Care Data and World Values Survey.


The sequence of distributions that converges weakly to π

Neural Information Processing Systems

We are very grateful to all the reviewers for their thoughtful feedback. All typos and minor points will also be fixed. Prop. 3 implies that any inference problem can be decomposed into a sequence of Another consideration, as highlighted by the example of 4.3, is that reducing the Bayesian computation, as the two methods have different computational cost patterns. This is required for each optimization step as well. Currently, however, we haven't found problems where the basis derived from H In the discussion after Prop. 1, we should have The phrase "lack of precision" in 4.4 refers to the finite number of samples drawn from


Asymptotic behavior of eigenvalues of large rank perturbations of large random matrices

Afanasiev, Ievgenii, Berlyand, Leonid, Kiyashko, Mariia

arXiv.org Artificial Intelligence

Random Matrix Theory (RMT) is a classical theory that has been developing for more than 70 years. Initially, RMT arose from problems in nuclear physics and found its applications in mathematics, physics, finance, and many other disciplines. Recently, new problems have been arising from the area of Machine Learning. Indeed, often the weight matrices of Deep Neural Networks (DNNs) are initialized randomly. Moreover, modern DNNs have large weight matrices, which is why their spectral properties can be described by asymptotic behavior of N N random matrices as N goes to infinity.



Statistical and Topological Properties of Sliced Probability Divergences S

Neural Information Processing Systems

We can now prove Theorem 1. Proof of Theorem 1. Now, let us prove the other implication, i.e. Theorem 2. Our result is thus consistent with the existing results in the literature. Next, we show that this result holds for two popular choices of kernels. We conclude that k ˆ k is positive definite, hence (S17) holds for RBF kernels.S1.4 Proof of Theorem 3 Proof of Theorem 3. We start by upper bounding the distance between two regularized measures. The desired result is obtained as a direct application of Theorems 2 and 3.S1.6



Control, Optimal Transport and Neural Differential Equations in Supervised Learning

Phung, Minh-Nhat, Tran, Minh-Binh

arXiv.org Artificial Intelligence

From the perspective of control theory, neural differential equations (neural ODEs) have become an important tool for supervised learning. In the fundamental work of Ruiz-Balet and Zuazua (SIAM REVIEW 2023), the authors pose an open problem regarding the connection between control theory, optimal transport theory, and neural differential equations. More precisely, they inquire how one can quantify the closeness of the optimal flows in neural transport equations to the true dynamic optimal transport. In this work, we propose a construction of neural differential equations that converge to the true dynamic optimal transport in the limit, providing a significant step in solving the formerly mentioned open problem.